The Shape of the Primordial Power Spectrum: A Last Stand Before Planck 
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We present a minimally-parametric reconstruction of the primordial power spectrum using the 
most recent cosmic microwave background and large scale structure data sets. Our goal is to 
constrain the shape of the power spectrum while simultaneously avoiding strong theoretical priors 
and over-fitting of the data. We find no evidence for any departure from a power law spectral 
index. We also find that an exact scale-invariant power spectrum is disfavored by the data, but this 
conclusion is weaker than the corresponding result assuming a theoretically-motivated power law 
spectral index prior. The reconstruction shows that better data are crucial to justify the adoption of 
such a strong theoretical prior observationally. These results can be used to determine the robustness 
of our present knowledge when compared with forthcoming precision data from Planck. 



I. INTRODUCTION 

The deviation from scale invariance of the primordial 
scalar power spectrum is a critical prediction of infla- 
tion, and unlike other potential signatures such tensor 
modes or non-Gaussianity, it is the only signature that is 
generic to all inflationary models. It is therefore a vital 
test of the inflationary paradigm, and we address it with 
a minimally parametric approach. 

Briefly, the idea is as follows. Choose a functional form 
which allows a great deal of freedom in the form of the 
deviation from scale invariance {e.g. smoothing splines). 
Naively fitting this to the data will lead one to fit the fiuc- 
tuations due to cosmic variance and experimental noise, 
with arbitrary improvement in the chi-square. Instead, 
one performs cross-validation: throw out some of the 
data (the validation set), fit the rest (the training set), 
and see how well it predicts the validation set. A very 
good fit to the training set, which poorly predicts the 
validation set, indicates over-fitting of noisy data. The 
final ingredient in the algorithm is a roughness penalty, 
a parameter that penalizes a high degree of structure 
in the functional form. By performing cross-validation 
as a function of this penalty, one can judge when the 
amount of freedom in the smoothing spline is what the 
data require without fitting the noise. A minimally para- 
metric power spectrum reconstruction combined with a 
roughness penalty set by cross-validation thus provides 
a method of determining smooth departures from scale 
invariance which avoids two pitfalls. Firstly, a strong 
theory prior on the form of the power spectrum {e.g. the 
commonly used power law prescription) can lead to arti- 
ficially tight constraints on - even a spurious detection of 
- a deviation from scale invariance, which is mostly due 
to the strength of the prior than that of the data. Sec- 
ondly, simple binning techniques [TJ |5J [31 HJ |S] or direct 
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inversion [SIHIHlinjIinilllJinilllllllin] of the data 
to obtain the primordial power spectrum can lead one to 
fit noisy data with a large improvement in chi-square^. 
A minimally parametric approach combined with cross- 
validation avoids these issues, providing a way to actually 
determine the strength of the shape prior justified by the 
quality of the data. Cross-validation would also be help- 
ful for alternative minimally-parametric methods |16l 117] 
e.g. in choosing the number of basis functions. 

In this work, we use the best available data over a wide 
range of scales corresponding to the longest "lever arm" 
of wavenumbers currently extant, to reconstruct the 
shape of the primordial power spectrum in a minimally- 
parametric way. ESA's Planck satellite, which has al- 
ready begun taking data, is expected to provide supe- 
rior constraints [T^ on the shape of the primordial scalar 
power spectrum by 2012. Our goal here is to establish 
a benchmark of what was known about the shape of the 
power spectrum before the Planck analysis. 



II. METHODOLOGY 

We perform a minimally-parametric reconstruction of 
the primordial power spectrum based on the method 
of Ref. 122]. Since the simplest inflationary models, 
which are consistent with the data, predict the pri- 
mordial power spectrum to be a smooth function, we 
search for smooth deviations^ from scale invariance with 
a cubic smoothing spline technique (for details, see 



^ Given a calibration uncertainty in the covariance matrix leading 
to an uncertainty of 3% in the absolute calibration of x^/d.o.f. 
(easily plausible with current data), it would not be surprising 
to see an improvement of Ax'^ ~ —30 relative to a smooth power 
spectrum by "fitting the noise" with a power spectrum containing 
a high degree of structure. 

^ A Bayesian reconstruction technique has been proposed in Ref. 
|23l which also avoids overfitting of the data and is perhaps more 
suited for discovering local violations of scale invariance. 
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FIG. 1: (Left) WMAP 5 year data [E] and (right) external CMB data from ACBAR and QUaD [201 [2T], showing knot 
placement (triangles, arbitrary normalization) and the cross-validation set-up. CVa is red and CVs is blue. We show only 
the temperature data here (in /xK'^) as the constraints on the power spectrum shape come mostly from the temperature data; 
in practice for each data set we also use the polarization data, which is crucial in lifting degeneracies with the cosmological 
parameters. The light blue line is a concordance LCDM model. 



Refs. 11511211 US] which we only briefly summarize here). 
In this approach, one aims to recover a function f{x) 
from measurements / at n discrete points Xi. 

Consider a description of / by a piecewise cubic spline 
F{x). It is uniquely defined by the values of F at 
"knots" once we ask for continuity of F{x) and its first 
and second derivatives at the knots, and two boundary 
conditions: we require the second derivative to vanish at 
the exterior knots. In our application, F{x) is the pri- 
mordial power spectrum P{k), and the data are: the an- 
gular power spectrum of the 5 year Wilkinson Microwave 
Anisotropy Probe (WMAP5) cosmic microwave back- 
ground (CMB) temperature and polarization [19]; alone 
or in combination with higher resolution, ground-based 
CMB experiments (QUaD and ACBAR PU]); or with 
large scale structure data: the Sloan Digital Sky Sur- 
vey (SDSS) Data Release 7 (DR7) Luminous Red Galaxy 
(LRG) power spectrum [55]; and the Lyman-alpha forest 
(Lyck) power spectrum constraints from Ref. |27| . This 
work thus represents a significant advance over previous 
work [25] , with a new WMAP release (two further years 
of data and a significant advance in the understanding of 
systematic errors) plus substantial improvements in both 
ground-based CMB data and large-scale structure data. 

We use 5 to 7 knots depending on the data set consid- 
ered (see Table |l] for details; the locations of the knots in 
k space are indicated in Figs, [ijand 2]). If the knot val- 
ues were allowed infinite freedom and were set simply by 
minimizing the chi-square, in general the reconstruction 
would fit features created by the random noise present 
in the data. It is therefore necessary to add a roughness 
penalty which we chose to be the integral of the second 
derivative of the spline function. The roughness penalty 
is weighted by a smoothing parameter: by increasing 
the smoothing parameter the roughness penalty effec- 



TABLE I: The cross-validation set-up and the adopted num- 
ber of knots for each data set used in the analysis. 

DATA SET CWa CVb # knots 



WMAP5 


yes" 


yes " 


5 


QUaD 


no 


yes 


6 


ACBAR 


yes 


no 


6 


SDSS DR7 


yes 


no 


6 


Lya 


no 


yes 


7 



"Following the choice as in Ref. [25) . see Fig.jT] 



tively reduces the degrees of freedom, disfavouring jagged 
functions that "fit the noise". In generic applications of 
smoothing splines, cross-validation is a rigorous statisti- 
cal technique for choosing the optimal smoothing param- 
eter. Cross-validation (CV) quantifies the notion that 
if the underlying function has been correctly recovered, 
it should accurately predict new, independent data. To 
make the problem computationally manageable, we opt 
for a n/2-fold cross-validation, where n is the number of 
data points. That is, the data set is split into two halves, 
say, A and B. A Markov chain Monte Carlo (MCMC) 
parameter estimation analysis (for a given smoothing pa- 
rameter) is carried out on one half of the data, finding 
the best fit model. Then the log likelihood of the sec- 
ond half of the data given the best fit model for the first 
half, CYab, is computed and stored. This is repeated by 
switching the roles of the two halves, obtaining GVba- 
The sum, CVab+CVba, gives the "CV score" for that 
smoothing parameter. Finally, the smoothing parameter 
that best describes the entire data set is the one that 
minimizes the CV score. Table |l] gives details of the 
implementation. Note that, as in Ref. the basic 

cosmological parameters (wbft.^, flch^, Qa, t) are varied 
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FIG. 2: Large-scale structure power spectrum in units of 
(/i/Mpc)^, showing knot placement (triangles, arbitrary nor- 
malization) and cross-validation set-up. CVa is red and CVs 
is blue. Red points represent the LRG power spectrum from 
Ref . |26| . The Lyman alpha measurement is represented by a 
filled box encompassing the constraints on the observed flux 
power spectrum from Ref. [27]. The light blue line is a con- 
cordance LCDM model. 



in the MCMC as well as the values of the smoothing 
spline at the knots, which describe the primordial power 
spectrum. The MCMC is implemented with modified 
versions of the CAMB [28 and COSMOMC Hlj packages, 
with very stringent convergence criteria. Now we will 
describe our treatment of the data. 

CMB Data: We use the v'ip2 version of the WMAP5 
likelihood function with standard options, with the 
temperature data divided into alternate (roughly equal 
signal-to-noise) ^ bins for CV^i and CV^ respectively, 
exactly as in Ref. [25^ . The polarization data are always 
used in both CV cases. For CVa we use the ACBAR 
bandpowers from Ref. [20| between 550 < I < 1950. For 
GW B we use the Pipeline 1 QUaD bandpowers between 
569 < ^ < 2026 from Ref. [21] (see Fig. 0). 

SDSS DR7 LRG Power Spectrum: The LRG data are 
used in CV^ with WMAP5 data. The data spans the 
range of wavenumbers 0.02 < k [h/Mpc] < 0.2. The 
likelihood function we use is identical to that presented 
in Ref. [26] (see Fig.[2|. 

Lyman-alpha Constraints: The Lya data are used 
in CVb with WMAP5 data (see Fig. ^. We use the 
publicly-released likelihood function by A. Slozar [3^ to 
obtain Lyman-a forest constraints. For this likelihood 
to be valid, the model P{k) must be well described by a 
three parameter model of amplitude, spectral slope and 
running at the Lyman-alpha forest scales i.e. 0.3 < k 
[h/Mpc] < 3. To check that this assumption holds in 
this fc-range for our more general description of P{k), 
we extrapolated the P{k) from the Monte Carlo Markov 
chains of Ref. [25] to the Lyman-a scales and found that 
in this fc-range the resulting spline can be well approxi- 
mated by the prescription of Ref. ^3^ . The residuals are 



at the percent level, well below the intrinsic Lyman-a er- 
rors. With the more recent data sets we consider here, 
the approximation is expected to be even better. 



III. RESULTS AND DISCUSSION 

Our main results are presented in Fig.[3]for several data 
sets with increasing range in k: WMAP5 only, WMAP5 
in combination with QUaD and ACBAR, and WMAP5 
in combination with SDSS LRGs and Lya. We show 
the reconstructed ns{k) for ease of comparison with the 
standard power law results. However, the quantity that 
was actually reconstructed using cross-validation to find 
the optimal penalty is the power spectrum. 

The optimal penalty for WMAP5 A^f is higher 
than what was found for WMAP3 by a factor of 25, 
and is consistent with the optimal penalty for WMAP3 
in combination with CMB data at smaller scales f25'. 
The corresponding risik) is shown in the top left panel of 
Fig. [3] The same optimal penalty is found for WMAP5 
and WMAP5-HQUaD-HACBAR (A^maps ^ A^p^B)^ and 
the latter reconstruction is shown in the top right panel 
of Fig. |3] For WMAP5H-LRGH-Lya, we find that CV 
becomes less sensitive to the value of the penalty, and 
the CV score dependence on the penalty flattens out 
at A^^^P^+^SS = 0.2AC^b. While this may indicate 
a preference for a less smooth P(fc), the data cannot dis- 
tinguish between AJ^;^^^^"'''"^^ and a penalty an order of 
magnitude higher. The reconstructed rigik) are shown 
in the left and right bottom panels of Fig. [3] for penal- 
ties Aj:^i^^P'^+LSS and 10AX'''''+'^'', respectively. The 
dark and light blue regions enclose the best (ordered by 
likelihood) 95% and 68% reconstructions. The 95% con- 
straints are not significantly broader than the 68% be- 
cause the reconstructed spectra are simply more wiggly; 
they are not allowed by the data to deviate more from 
the best fit, consistently across scales. 

Cross-validation is a useful tool to check for indications 
of unidentified systematic biases in the data. For exam- 
ple, in Ref. [5S] we found that the 3 year WMAP data 
(WMAP3) by itself favored a primordial power spectrum 
with a downward deviation from a power law at small 
scales (see Fig. 2 of Ref. [25]). However, this feature 
disappeared when combining WMAP3 with other data 
sets (see Figs. 3 and 5 of Ref. [25]) which overlapped 
WMAP3 on the scales corresponding to the feature - 
an inconsistency suggestive of a small residual system- 
atic effect in the high £ WMAP3 data. Ref. [ST argued 
(based on considerations of frequency dependence) that 
the unresolved residual point source contribution to be 
subtracted from the raw Ct should have been smaller by 
28% - and its uncertainty increased by 60% - compared 
to the WMAP3 official values. To judge if smoothing 
spline cross-validation could give some insights on pos- 
sible residual systematic errors, we investigated how the 
point source subtraction level should have been changed 
for the aforementioned downturn at small scales to dis- 
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FIG. 3: Reconstructed spectral index na(fc) for various data combinations: WMAP5 with optimal penalty (top left), 

WMAP5+QUaD+ACBAR with optimal penalty X^^'^ (top right), and WMAPS+LRG+Lya for two values of the penalty 
(bottom, left and right). Dark and light blue regions correspond to the best 95% and 68% reconstructions. The solid black line 
is the maximum likelihood fit. For comparison, the dashed line corresponds to a scale-invariant P{k). See text for details. 



appear from the reconstructed power spectrum. We ob- 
tained a point source amplitude ~ 20% lower than the 
WMAP estimated value, which is tantalizingly close to 
the estimate of Ref. [31] . 

In WMAP5, there is no longer any indication of de- 
viations from a power-law primordial power spectrum, 
and the data require a smoother power spectrum (higher 
penalty) than WMAP3. 

We find that WMAP5, CMB experiments at smaller 
scales, and the LRG power spectrum are all consistent 
with each other. With the addition of Lya data, a lower 
penalty value is allowed. This could be a tentative indica- 
tion of possible tension between Lya and the other data 
sets, but not a very significant one: there is a cancellation 
between the effect of penalty and the effect of the likeli- 
hood over a wide range of penalty values as shown in the 
bottom panels of Fig. |3]). In addition, as LRG and Lya 
scales do not overlap, we cannot exclude the possibility 
of a low-significance local feature in the power spectrum. 

In Fig.|4] we show the reconstructed ns{k) for the CMB 
and LRG data, with optimal penalty A^fAPS^ rp^^ 



setup for WMAP5 is the same as before, LRGs are added 
in CVa, and QUaD+ACBAR are included together in 
GVb- We have excluded the Lya data as it is the only 
non-overlapping data set. For comparison, we also show 
the 95% and 68% constraints gS] for WMAP5-hLRG 
data when a power-law spectral index is assumed to de- 
scribe the shape of the primordial power spectrum. We 
see no evidence that any fe-dependence of rig is necessary 
to describe the data in the CV reconstruction. While 
ris = 1 is disfavoured, the significance of the departure 
from scale invariance is weaker than when the "inflation- 
motivated" power law spectral index prior is adopted. 

This minimally-parametric reconstruction highlights 
how constraints relax when generic forms of P{k) are 
allowed. While this reconstruction is in agreement with 
the inflationary prior, it illustrates that better data are 
needed to justify its adoption observationally. Forthcom- 
ing data from Planck will signiflcantly reduce the current 
reliance on priors in our understanding of the shape of the 
primordial power spectrum. Future large-scale structure 
data and Planck will overlap over a decade in scale, offer- 
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ing extra consistency checks. Lyman alpha data, on the Acknowledgments 
other hand, offer the potential to extend the lever arm 
by at least another decade. We hope that the results 
presented here will form a basis to judge the robustness 
of our present knowledge when confronted with the pre- 
cision measurements that are on the horizon. 
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FIG. 4: Reconstructed spectral index Usik) from WMAP5, 
ACBAR, QUaD and SDSS DR7 LRG data with optimal 
penalty determined from cross-validation excluding Lya. The 
orange-red band shows the 95% and 68% Us constraints [26] 
for WMAP5+LRG data with a power-law prior. 
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